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Although the mean square error (MMSE) approach is recognized to be near 
optimal for uplinking large-scale multiple-input-multiple-output (MIMO) 
systems, there are certain difficulties in the procedure related to matrix 
inversion. The long recurrence enlarged conjugate gradient (LRE-CG) 
approach is proposed in this study as a way to iteratively realize the MMMS 


algorithm while avoiding the complications of matrix inversion. In addition, 
a diagonal-approximate starting solution to the LRE-CG approach was used 
to speed up the conversion rate and reduce the complications required. It has 
been discovered that the LRE-CG-based approach has the ability to 
significantly reduce computational complexity. By comparing simulation 
results, it is clear that this new methodology surpasses well-established ways 
like the Neumann series approximation-based method and the Gauss-Siedel 
iterative method. With a small number of iterations, the suggested approach 
achieves near-optimal performance of a standard MMSE algorithm. 
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1. INTRODUCTION 

Many communication systems, including the fourth generation (4G) cellular system, IEEE 802.11n 
wireless local area network system [1], [2], long-term evolution advanced (LTE-A) [1], [2], and many more, 
have demonstrated the benefits of using multiple inputs and multiple outputs (MIMO). In [3] has received 
widespread praise from communication specialists as a promising core technology that has the potential to be 
used to a variety of wireless communication systems in the near future. Large-scale MIMO differs from the 
more common small-scale MIMO technology. In LTE-A, regular MIMO is typically equipped with eight 
antennas; however, large scale MIMO is provided with a huge number of antennas, which might be as many 
as 128 or even more. This technology, according to a newly proposed method, would allow these antennas in 
the base station to simultaneously service many user equipment devices [4]. There are theoretical evidences 
that large-scale MIMO systems are capable of achieving high energy efficiency while still achieving orders 
of magnitude increases in spectrum [5]. 

In the course of evaluating the practical advantages of large scale MIMO, various difficulties have 
been observed. For example, increasing the performance of the practical signal detection algorithm in the 
uplink to accommodate multiuser interferences. Growth in the number of transmit antennas has been shown 
to cause a fast increase in the complexity of ideal maximum likelihood (ML) detectors [6]. As a result, it 
becomes impracticable for large-scale MIMO systems, and their relevance diminishes as a result of this. In 
order to achieve near-optimal performance while reducing the degree of complexities, nonlinear detection 
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algorithms, such as fixed-complexity sphere decoding [7] and tabu search [8] are proposed. However, this 
low degree of complexity continues to be a concern when the MIMO system is vast in size or when the 
modulation order is high [9] (for example, when there are 128 antennas at the base station and 64-quadrature 
amplitude modulation) (QAM). Every user equipment (UE) in the coverage area is serviced by every access 
point (AP) in the communication range in the traditional cell-free massive MIMO topology [10]-[12]. In 
research [13], a typology is proposed in respect of uplink receiver coordination across APs with CPU, 
ranging from entirely dispersed (level 2) to totally centralized (level 4) implementations, with the highest 
level of cooperation being the most cooperative. Although scalability issues for channel estimation, data 
decoding/precoding, and fronthaul signaling have been highlighted in recent work [14], [15], it is imperative 
that these issues be overcome in order to enable large-scale deployments of cell-free networks on a global 
scale. It has encouraged researchers to propose user-centric ways to selectively service a subset of APs in 
wide coverage areas, due to the fact that the majority of APs in a broad coverage area have insignificant 
channel gains to one or more specific UEs [16], [17]. 

In order to deal with the complexities while maintaining high performance, a low-complexity linear 
detection algorithm such as zero-forcing and minimum mean square error (MMSE) could potentially be used 
for up-linking the multiuser large-scale MIMO systems [18]. MMSE is a linear detection algorithm with 
near-optimal performance for up-linking the multiuser large-scale MIMO systems. This approach, on the 
other hand, employs a matrix inversion that is both difficult and unfavorable in nature. For translating matrix 
inversion into matrix-vector multiplication series [19], the Neumann series approximation approach was 
recently introduced. Although this algorithm has the potential to reduce complexity, the reduction is not very 
substantial. 

The complications caused by the linear detector with perfect inversion matrix increased in tandem 
with the increase in the number of users in large scale MIMO systems, making them prohibitively expensive. 
There have been a number of studies undertaken that have focused on the Neumann serious expansion (NSE) 
for approximation purposes in order to overcome the precise matrix inversion [20]—[27]. However, it has 
been shown that when the NSE number is greater than 2, the amount of complexity increases significantly 
once again. There have been other iterative linear algorithms suggested recently to achieve a better balance 
between performance and complexity, including the conjugate (CG) method [28]-[32] the Gauss-Seidel (GS) 
algorithm [33]—[35], and the successive over-relaxation (SOR) algorithm [36]—-[38] among others. In order to 
get better MIMO detection with less complexity, these techniques are believed to be beneficial. Pyla et al. 
in [39], they propose to include the dynamic cooperative grouping methodology from the connectivity 
MIMO research [40], [41] into cell-free massive MIMO. There may be overlap between the AP groupings 
that service various UEs, and the groups are chosen based on the demands of the users. 

Jiang et al. [42] take the position that the dynamic cooperative grouping may be used with both 
centralized (level 4) and completely dispersed (level 2) uplink implementations in the same network. 
However, with DCC, the level 3 implementation (based on the taxonomy in [43]) has not been addressed 
because it is not required. When the CPU reaches level 3, it adds a second layer of decoding, known as 
largescale fading decoding (LSFD), in order to reduce interference. When compared to level 2 in the original 
cell-free massive MIMO [44], this distributed processing technique has been demonstrated to significantly 
enhance the SE. However, this method has not been investigated in user-centric networks. The best SE 
performance among the levels is achieved by using level 4, but this requires the computation of centralized 
receive combiners at the CPU, which has significantly higher dimensions when contrasted to level 3 and 
level 2 local beamforming and thus increases the complexity of the algorithm of the level. 

To further examine the problem associated with the previously mentioned issue, we suggest in this 
work that the matrix inversion-less signal detector technique with a low degree of complexity attached to it 
might be employed for a large scale MIMO system in an effort to investigate the problem. The suggested 
technique is based on the long recurrence expanded conjugate gradient (LRE-CG) method [45], which makes 
it suitable for large-scale MIMO systems due to its low computational complexity. Instead than focusing on 
identifying new research areas, we believe that establishing an orthonormal basis for Krylov subspace with a 
big dimension is far more important at this time. In addition to being utilized to update the solution, the full 
basis is also employed to prevent the occurrence of excessively intricate matrix inversions. The method's 
convergence rate is also projected to be increased to a more acceptable level as a result of this improvement. 
The convergence of the suggested signal detection method is also demonstrated in this work, hence ensuring 
its practicability and viability in the real world. This paper's approach, which has been validated with the help 
of stimulation results, has the ability to efficiently address the matrix inversion issue inside the iterative 
procedure up to the point where the required accuracy direction is attained. According to a survey of current 
literature and research effort relevant to this subject matter, this paper represents the first and only attempt to 
employ the LRE-CG approach for the process of signal detection in an uplink large scale MIMO system [46], 
[47] that has been made. 
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This paper has been divided into sections to help readers to have a comprehensive and clear grasp of 
the problem that has been recognized and the solution that has been suggested in the study. Section 2 of this 
document offers a brief overview of the system modeling methodology. It has been attempted in section 3 to 
define the suggested low complexity signal detection method, as well as the process of its convergence and a 
study of its complexity associated with it. Section 4 presents the findings of the bit error (BER) stimulation of 
the performance of our suggested systems’ performance. Section 5 concludes with a synopsis of the complete 
piece of work. 

In this paper, there are lowercase and uppercase boldface letters have been used to dignify the vector 
and matrices respectively (-)7, ()¥, (-)71, and |-| is used for denoting the transpose, conjugate transport, 
matrix inversion and absolute operators, respectively. On the other hand notations Re{-} and Jm{-} are used 
to denote the real part and imaginary part of a complex number, respectively; and finally, Jy is the 
representative of the N x N identity matrix. 


2. SYSTEM MODEL 

For the system model, first we will consider an uplink large scale MIMO system where N antennas 
are employed at the base system and K selected single antenna UE devices are simultaneously served for 
communicating. The N >> K assumption is made in this case, e.g., N=128 and K=16 [31]. In the parallel 
transmitted bit stream, K different users’ signals are encoded separately at first. In order to map it to the 
constellation system, the channel encoder encode the data first. In order to conduct the mapping, values are 
extracted from the energy normalized modulation constellation Q. s in this model represents the Kx1 
transmitted signal vector which includes the transmissions from all the K users and H € CN*¥ is used to 
denote the flat Rayleigh fading channel matrix with zero mean and unit variance in which all the entries are 
considered to be independent as well as identically disturbed. The signal vector y in the Nx1 receiver can be 
expressed as: 


y=Hs+n (1) 


In (1), n is a Nx1 additive white Gaussian noise vector whose entries follow CN (0, 07). Multi-user signal 
detection work has been performed at the base station BS in order to get the estimated about the transmitted 
signal vector s from the noisy signals vector y received. It is important to note here that the channel matrix H 
is usually obtainable through time domain and frequency domain training pilots [42], [43]. Now, the 
estimated transmitted signal vector $ that is obtained by the MMSE linear detection method can be expressed 
as (2): 


§ = (H"H — 07 lx) H" y = W"* yup (2) 
Here the ymp = H"y is the matched-filter output of y, and the MMSE filtering matrix W is denoted by (3): 

W =G-o?lx (3) 
where G = H"H represents of the Gram matrix. Using the estimated results for soft-input channel decoding, 
the log-likelihood ratios (LLRs) of the transmitted signal vector can be derived. The assumption at this point 
is that the equivalent channel matrix is E = W71G and U = W71H"(W71H")" = W-1GW 71. Therefore, 
with (1) and (2) combined, the MMSE estimate $ is: 


§=Es+W-HEn (4) 


Sk = HkSk + ôk (5) 


where Sẹ denotes the symbol employed to represent the k“ element of the vector of the transmitting signal s. 
Uk = Eķęg denotes the channel gain, and 67 = X£ 4p|Emg|? + Ugga? represents the noise plus interference 
(NPI) variance; Ugg denotes the one component of matrix U in the k™ row and k-th column and Emg denote 
the one component of matrix E in the m™ row and k™ column. However, the LLR £x p of the k*™ user can be 
expressed [24]: 
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the groups that consist of the signs of the constellation Q. 

From the above, it is clearly demonstrated that the MMSE linear detection algorithm is almost 
optimal for the process of uplink the multiuser large scale MIMO systems. However, it has also been verified 
that it is not possible to avoid the sophisticated matrix inversion W~1 included in MMSE algorithm. To 
calculate the final LLRs for soft-input channel decoding, MMSE estimates, channel gain and noise plus 
interference (NPI) variance are essentially needed where they can be computed by using the matrix inversion. 
The complexity computing of matrix inversion is O(K*) which is considered high because K is typically very 
large in the uplink large scale MIMO system [21]. 

Here, it has been proposed to use a low complexity signal detection technique in which the iterative 
LRE-CG algorithm is been employed to estimate the MMSE without the need of matrix inversion. Adding a 
diagonal approximate initial solution [25] to the LRE-CG method, we have used it to enhance the 
convergence and reduce the degree of complexity. Alongside, we also propose for estimating the channel 
gain and NPI variance for LLR computation by employing an approximated method that is not required to 
compute the exact matrix inversion. To sum up, the overall analysis of the proposed LRE-GC algorithm has 
been presented to demonstrate that there are certain advantages of this algorithm over other typical and 
conventional sophisticated methods found in the literature. 


3. THE PROPOSED TECHNIQUE 

For an uplink large scale MIMO system, the channel matrix H is an asymptotically orthogonal 
column full rank matrix according to the suggested technique. It guarantees the Hermitian positive 
definiteness of the MMSE filtering matrix W. The LRE-CG technique [15] may be used to iteratively solve 
(2) in the absence of matrix inversion because of its particular characteristic. N-dimensional linear equation 
Ax = b has been solved using the LRE-CG technique, while, A represents the N-dimensional Hermitian 
positive definite matrix, x denotes the N-dimensional solution vector and b represents the N-dimensional 
measurement vector. With the LRE-CG approach, which differs from the usual method in that it does not use 
a computer at all to solve the equation of A™tb = x repeatedly, the complexity of solving Ax = b is kept to 
an absolute minimum. W is a Hermitian positive definite matrix, hence we may decompose it as a Hermitian 
positive definite matrix. 


W=D+L4+Lt (7) 


Matrix W's diagonal and lower triangulal halves are referred to as D and L, respectively. LRE-CG technique 
is used to estimate the transmitted signal vector s once this step has been completed. Krylov projection 
technique of the LRE-CG is used to solve a linear system of equations [15]. Without the matrix inversion, the 
LRE-CG approach may be able to solve the issue of (3) by addressing the following optimization problem, 


§=arg min||H"b — AS|| (8) 


where A = H"H + Noly E C¥*" denotes a positive definite matrix, which represents the regularized uplink 
Gram matrix. The method in [15] may be used to iteratively compute the solution, utilizing LRE-CG 
technique with minimal computational cost. As an alternative, LRE-CG may be used to determine the 
transmitted signal vector s at the i™ iteration, 


ŝi = ŝi-1 + Pia; (9) 


where P; represents the U X t matrix that consists of the t sub-domain search-directions, and a; represnets 
the vector of size t. Our LRE-CG-based technique to soft output data identification is summarized in 
algorithm 1. Our LRE-CG approach is based on algorithm in [15]. Even with an infinite number of repeats, 
the suggested algorithm reduces the complexity of the MMSE technique from O(K3) to O(K2) in 
algorithm 1. 
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Algorithm 1. LRE-CG for soft-output MMSE detection Flops 
Input: 

A, the nXn symmetric positive definite matrix 

b, the nX1observed vector 

Xo; the initial guess 

€, the stopping tolerance. 

t, Number of the Subdomains (search directions) 

Ittmax, the maximum allowed iterations 


Output: 

Xir, the approximate solution 

1. r=b-— Axo, Itr=1 2nnz— 1 

2. W=T(r), Q=W 2nnz + n(t — 1) 
3. A-orthonormalize P, 

4. While (Itr < Itrmax) do 

5. G = (QAQ) (2nnz — n)t + (2n — 1)t? 
6. @a=G1(Q'r) (2n — 1)n 

7. Xitr = Xitr-1 + Qa 2nt 

8. r=r—AQa 2nt 

9. W = AW (2nnz — n)t 
10. A-orthonormalize W using modified Gram Shmidt nnzt? + nt? 
12. Q=QW (2nnz — n)t 
13. Itr=Itr+1 1 


14. End While 


3.1. Computational complexity 

According to algorithm 1, the computational complexity is assessed as follows. The calculated 
number of multiplications for each step in the proposed approach are used to calculate the final result. At 
each iteration, the total number of multiplications is given by: 


O = 4Ut? + 8Ut + 2U 


where t denotes the number of search directions. Since the presented method aims to decrease the 
computational cost, Itt%nq, should be made considerably less than Uso that the presented algorithm's 
computational complexity is less than O (U*). Also, we investigate computational complexity as it pertains 
to various approaches found in the literature for comparison in the next section. 


4. RESULTS AND DISCUSSION 

Monte-Carlo simulations have been carried out in a coded 20-MHz multiple-output orthogonal 
frequency-division multiplexing (MIMO-OFDM) uplink system with 2,048 subcarriers in order to evaluate 
the performance of the proposed technique in terms of error-rate performance. 1,200 of these are used for 
data transmission, such as in LTE advanced (LTE-A) [31] and other networks. The 64-QAM modulation 
scheme is used in conjunction with Gray mapping and a rate-3/4 turbo code. The channel matrices were also 
created in order to get the spatial and frequency correlation, for which we utilized the WINNER-Phase-2 
model [34] with 7.8 cm antenna spacing, similar to the models used in [11] and [22]. It has been decided to 
use a logarithmic maximum a posteriori (Log-MAP) turbo encoder for the purpose of decoding the channel. 
A bit error rate is also supplied, which is calculated by coding over one OFDM signal with 1,200 data 
subcarriers and calculating the bit error rate. In this regard, we concentrate on a number of massive MIMO 
detection systems. The experiments were conducted out due to MATLAB program on Intel Core 17 CPU 
with a 2.4-GHz processor and 4G MB RAM, as well as a MATLAB environment. 

Figure 1 shows a comparison of the bit error rate (BER) for the presented method in the study, as 
well as for other precise and approximate data-detection algorithms utilized for huge MU-MIMO systems 
with various antenna configurations, as shown in the paper. We have specifically acquired the BER findings 
for the Neumann series detection [7], the CG-based detection [10], and the Gauss-Seidel (GS)-based 
detection [13] techniques. In addition to this, we have supplied a reference equalization that is an exact linear 
MMSE equalizer as well. Three rounds of BER versus signal to noise ratio (SNR) of the described techniques 
are shown in Figure 1 with simulation results for each iteration. It is set up with the following parameters: 
N=128 antennas, U=8 users, and SNR values ranging from -10 to 20 decibels (decibels per kilometer). Figure 
1 illustrates that the suggested technique, which is based on the LRE-CG method, is capable of approaching 
the performance of the MMSE algorithm while consistently delivering the lowest BER when compared to 
other algorithms described in the literature. Also included is a comparison of the average CPU timings for the 
various techniques, which is presented in Table 1. As demonstrated in Table 1, the suggested approach is 
comparable to the other algorithms, and it even outperforms the other methods when it comes to BER. 
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Furthermore, it is undeniable that the CG technique and the Richardson method are less difficult algorithms 
than the other algorithms available. However, with the introduction of increasingly powerful computer 
systems, such as graphics processing units (GPUs), the accuracy of performance measurements has gained in 
importance. 

Figures 2 and 3 show the performance of the suggested algorithm, which is based on the LRE-CG 
technique and the other methods discussed above, when N = 128 and U = 16, respectively. As seen in 
Figures 2 and 3, the suggested method comes close to the performance of the MMSE algorithm while 
outperforming other algorithms that have been reported in the literature. Table 2 also includes a comparison 
of typical computation times, which illustrates the difference between the two approaches. As demonstrated 
in Table 2, the suggested technique is comparable to the other methods, and it even outperforms them in 
terms of BER performance. 


Massive MU-MIMO uplink systems 
N = 128, U = 8, Itr rax = 3, Mod is 64QAM 
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Figure 1. The BER compared the proposed estimated technique and alternative ways to calculate LLRs for 
N = 128 antennas with 8 users and different SNRs 


Table 1. Average computational times for each method (in sec) for U x N = 8 x 32 case 
MMSE CG SOR Neurmann Richardson GS LRE-CG 
Ttrmax = 3 1.03 e-04 8.15e-05 1.31 e-04 1.26 e-04 5.41e-05 1.31e-04 —5.53e-05 


After that, in Figure 4, we compare the BER performance of the proposed algorithm with the SOR 
technique, the GS-based approach, the standard algorithm based on Neumann, and other algorithms in the 
literature using a variety of situations. It has been found that the suggested method operates admirably with a 
variety of antenna and user configurations. It is also demonstrated that when the number of iterations of the 
MMSE algorithm increases, the BER performance of the method approaches that of all traditional techniques 
in terms of BER. However, when a comparable number of iterations is used, the suggested technique is found 
to be more superior when compared to the other approaches in terms of performance. As shown in Figure 4, 
we also offer simulation results that are based on a comparison between the number of antennas at the base 
stations and the BER performance of the proposed method when a certain number of users is taken into 
consideration. It can be observed that as the value of N grows, the performance of the MMSE technique 
improves in a corresponding fashion. The precise performance of the algorithm may be attained by the 
suggested technique, regardless of the number of antennas used, when the number of iterations is kept to a 
bare minimum, such as three iterations. On the contrary, the performance of the GS-based and Neumann- 
based algorithms improves when the number of iterations is increased, while there is still a performance loss 
due to the lack of negligibility in the algorithms. According to the results of this comparison, the other 
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standard algorithms in the literature are less superior than the suggested method. The Neumann series 
approach also performs well in the scenario (N x U = 128 x 8) which reinforces the impression in [12] that 
this method requires a high user to BS ratio (p = N/U), which is supported by the results of this study. 
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Figure 2. The evaluation of proposed approximation technique's BER to alternative approaches for 
calculating LLRs for a system with 128 antennas and 16 users, using Itrmax = 3 
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Figure 3. The evaluation of proposed approximation technique's BER to alternative approaches for 
calculating LLRs for a system with 128 antennas and 16 users, using Itrmax = 4 


Table 2. Average computational times for each method (in sec) for U x N = 16 x 128 case 


MMSE CG SOR Neurmann 


Richardson GS LRE-CG 


Ittinax = 3 


Ittnax = 4 
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1.44e-04 1.23e-04 2.84e-04 4.12e-04 
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1.27e-04 2.83e-04 1.26e-04 
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Figure 4. BER performance comparison in the massive MIMO uplink for (a) UxN=16x64 with Ita, = 3, 
(b) UxN=16x64 with Itrmax = 5, (c) UxN=16x32 with Itfrnax = 3, (d) UxN=16x32 with Ittna, = 4, (e) U 
xN=24 x 128 Ittina, = 4, and (f) UxN=8x64 Ithnax = 3 
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5. CONCLUSION 

As a conclusion, we state that the proposed detection using approximation LLR calculation has high 
resilience against changes in channel correlation and loading factor, which is summarized in this paper. In 
our numerical findings, it has been demonstrated that, for relatively high ratios between base station and user 
antennas, the proposed detection strategy rapidly corresponds to the performance of an accurate detection 
technique. So the proposed methodology is capable of producing performances that are comparable to those 
of an accurate inversion method while needing (in many cases) less computing complexity. Further to the 
point, the approximate Neumann series inversion and other schemes suggested in the literature are 
outperformed by the proposed scheme in terms of both efficiency and complication, and our system is less 
complicated. The proposed detector is efficient and can be used in a variety of antenna configurations in large 
MIMO systems with a variety of antenna types. 
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